Skip to content

perf(copilot): read chat transcripts from copilot_messages (R+1 cutover)#4808

Merged
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/copilot-messages-cutover-prep
May 30, 2026
Merged

perf(copilot): read chat transcripts from copilot_messages (R+1 cutover)#4808
waleedlatif1 merged 2 commits into
stagingfrom
waleedlatif1/copilot-messages-cutover-prep

Conversation

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@waleedlatif1 waleedlatif1 commented May 30, 2026

What

Cuts over user-facing copilot chat reads from the legacy copilot_chats.messages JSONB array to the normalized copilot_messages table. This is the R+1 read cutover — the payoff for the table + seq ordinal work that already shipped.

Why

copilot_chats is 5.7 GB, 99% of it the messages JSONB in TOAST. Every chat load detoasted + decompressed the whole array. Reading from copilot_messages via the (chat_id, seq) index avoids that entirely — biggest win on large/tail chats and on keeping the base table lean.

How

  • New helper loadCopilotChatMessages(chatId) in lifecycle.ts reads content from copilot_messages ordered by seq ASC NULLS LAST, created_at ASC, id ASC (the verified canonical order; raw sql fragment because Drizzle's asc() omits NULLS LAST).
  • Both detail getters (getAccessibleCopilotChat, getAccessibleCopilotChatWithMessages) drop messages from the metadata select (no more detoast) and assemble the transcript from the table after authorization (no wasted query on denied access).
  • This cascades to the copilot GET (/api/copilot/chat), mothership GET (/api/mothership/chats/[chatId]), and resolveOrCreateChat's conversationHistory (the LLM payload) — all via the two getters.
  • New-chat insert uses a dedicated returning column set so a freshly-created chat returns messages: [] without a second query.

The normalize → effective-transcript pipeline is unchanged and source-agnostic (copilot_messages.content is the same shape as a JSONB array element), so transcripts are byte-identical.

Scope / safety

  • Dual-write stays on; the JSONB column stays written — it remains the source for internal-logic reads (terminal-state, fork, cleanup, workspace-vfs) and a fallback. Removing JSONB writes is a later step.
  • No feature flag (per direction). Revert = reads fall straight back to JSONB, zero data implications.

Integrity verified on prod before cutover

0 messages missing from the table · 0 NULL-seq · 0 duplicate keys · 0 duplicate seq within a chat · 0 orphans · order-parity vs JSONB = 0 mismatches.

Tests

  • New lifecycle.test.ts: getters source messages from the table in order; empty chat → []; auth-deny → null with no messages query; legacy getter; resolveOrCreateChat existing (table-sourced history) vs new (empty, no read).
  • Full suite: 472 files / 7,285 tests pass. Type-check clean, biome clean, check:api-validation passes.

Post-deploy verification

Staging smoke: load a large chat via both GETs, confirm identical transcript; EXPLAIN shows copilot_messages_chat_seq_idx and no detoast of copilot_chats.messages. Re-run the low-load TABLESAMPLE parity spot-check (currently 0).

Flip user-facing chat reads from the legacy copilot_chats.messages JSONB
array (5.7GB, 99% TOAST) to the normalized copilot_messages table via a
new loadCopilotChatMessages helper ordered by seq NULLS LAST, created_at,
id — the verified canonical order. Both chat-detail getters
(getAccessibleCopilotChat, getAccessibleCopilotChatWithMessages) now drop
the messages column from their metadata select (no more whole-array
detoast on every load) and assemble the transcript from the table after
authorization. This cascades to the copilot + mothership GET endpoints
and to resolveOrCreateChat's conversationHistory (the LLM payload).

The normalize/effective-transcript pipeline is source-agnostic
(copilot_messages.content == a JSONB array element), so transcripts are
byte-identical. Dual-write and the JSONB column stay in place as the
internal-logic source and fallback; removing JSONB writes is a later step.

Prod integrity verified before cutover: 0 messages missing, 0 NULL-seq,
0 dup keys/seq, 0 orphans, order-parity vs JSONB = 0 mismatches.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 30, 2026 3:00am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 30, 2026

PR Summary

Medium Risk
Switches the primary read path for all user-facing chat transcripts; mis-ordering or drift from dual-write JSONB would change LLM/history payloads, though shape is unchanged and JSONB remains written.

Overview
User-facing copilot chat loads now build transcripts from copilot_messages instead of detoasting copilot_chats.messages. In lifecycle.ts, loadCopilotChatMessages loads non-deleted rows ordered by seq (NULLS LAST), created_at, and id; getAccessibleCopilotChat and getAccessibleCopilotChatWithMessages stop selecting the JSONB blob, authorize first, then attach the table-backed message list. That path feeds copilot/mothership GETs and resolveOrCreateChat’s conversationHistory; new chats return messages: [] from insert without a messages query.

Adds lifecycle.test.ts for ordering, empty transcripts, auth failures skipping the messages query, legacy getter behavior, and create vs load paths. Dual-write to JSONB is unchanged for other readers.

Reviewed by Cursor Bugbot for commit 2e7f4ec. Configure here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 30, 2026

Greptile Summary

This PR cuts over copilot chat transcript reads from the legacy copilot_chats.messages JSONB column to the normalized copilot_messages table, avoiding the costly TOAST decompression on every chat load while keeping the dual-write intact for fallback and internal-logic reads.

  • New loadCopilotChatMessages(chatId) reads from copilot_messages in (seq ASC NULLS LAST, created_at ASC, id ASC) order; both detail getters (getAccessibleCopilotChat, getAccessibleCopilotChatWithMessages) drop messages from the metadata select and call it only after authorization succeeds.
  • A separate copilotChatDetailReturningColumns set is introduced for new-chat inserts so a fresh row returns messages: [] directly from the RETURNING clause without issuing a second query.
  • New lifecycle.test.ts verifies table-sourced messages, empty transcripts, chat-not-found and auth-denied no-query guards, the legacy getter, and both existing/new paths in resolveOrCreateChat.

Confidence Score: 5/5

Safe to merge — reads from the normalized table only after authorization succeeds, dual-write and JSONB column are untouched, and pre-cutover integrity was verified on prod.

The change is a well-scoped read-path swap: the JSONB column remains written and available as a fallback, authorization always gates the new messages query, and the normalized table was fully validated against the JSONB source before cutover. The implementation is clean, the tests exercise the critical invariants (including the auth-denied no-query contract added in the head SHA), and the type-check and full suite pass.

No files require special attention.

Important Files Changed

Filename Overview
apps/sim/lib/copilot/chat/lifecycle.ts Introduces loadCopilotChatMessages helper reading from copilot_messages; both detail getters and resolveOrCreateChat correctly sequence authorization before the messages query; the NULLS LAST raw SQL fragment is the correct Drizzle pattern for this ordering requirement.
apps/sim/lib/copilot/chat/lifecycle.test.ts New test file covering all key invariants: messages sourced from the table in order, empty transcript, chat-not-found no-query guard, auth-denied no-query guard (added in the head SHA per previous thread), legacy getter, and both branches of resolveOrCreateChat.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant getAccessibleCopilotChatWithMessages
    participant copilot_chats DB
    participant authorizeCopilotChatRow
    participant loadCopilotChatMessages
    participant copilot_messages DB

    Caller->>getAccessibleCopilotChatWithMessages: (chatId, userId)
    getAccessibleCopilotChatWithMessages->>copilot_chats DB: SELECT metadata columns WHERE id=chatId AND userId=userId LIMIT 1
    copilot_chats DB-->>getAccessibleCopilotChatWithMessages: chat row (no messages JSONB)
    getAccessibleCopilotChatWithMessages->>authorizeCopilotChatRow: (chat, chatId, userId)
    alt not found or auth denied
        authorizeCopilotChatRow-->>getAccessibleCopilotChatWithMessages: null
        getAccessibleCopilotChatWithMessages-->>Caller: null (no messages query)
    else authorized
        authorizeCopilotChatRow-->>getAccessibleCopilotChatWithMessages: authorized row
        getAccessibleCopilotChatWithMessages->>loadCopilotChatMessages: (chatId)
        loadCopilotChatMessages->>copilot_messages DB: SELECT content WHERE chat_id=chatId AND deleted_at IS NULL ORDER BY seq ASC NULLS LAST, created_at ASC, id ASC
        copilot_messages DB-->>loadCopilotChatMessages: [{content}, ...]
        loadCopilotChatMessages-->>getAccessibleCopilotChatWithMessages: "Record<string,unknown>[]"
        getAccessibleCopilotChatWithMessages-->>Caller: "{...authorizedRow, messages}"
    end
Loading

Reviews (2): Last reviewed commit: "test(copilot): cover auth-deny on a foun..." | Re-trigger Greptile

Comment thread apps/sim/lib/copilot/chat/lifecycle.test.ts
Address PR review: exercise the `if (!authorized) return null` contract —
when the chat row exists but authorization fails, the getter returns null
and never issues the copilot_messages read.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@greptile

@waleedlatif1
Copy link
Copy Markdown
Collaborator Author

@cursor review

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 2e7f4ec. Configure here.

@waleedlatif1 waleedlatif1 merged commit 640b7e1 into staging May 30, 2026
14 checks passed
@waleedlatif1 waleedlatif1 deleted the waleedlatif1/copilot-messages-cutover-prep branch May 30, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant